A Fast and Reliable Policy Improvement Algorithm
نویسندگان
چکیده
We introduce a simple, efficient method that improves stochastic policies for Markov decision processes. The computational complexity is the same as that of the value estimation problem. We prove that when the value estimation error is small, this method gives an improvement in performance that increases with certain variance properties of the initial policy and transition dynamics. Performance in numerical experiments compares favorably with previous policy improvement algorithms.
منابع مشابه
Coverage Improvement In Wireless Sensor Networks Based On Fuzzy-Logic And Genetic Algorithm
Wireless sensor networks have been widely considered as one of the most important 21th century technologies and are used in so many applications such as environmental monitoring, security and surveillance. Wireless sensor networks are used when it is not possible or convenient to supply signaling or power supply wires to a wireless sensor node. The wireless sensor node must be battery powered.C...
متن کاملVRED: An improvement over RED algorithm by using queue length growth velocity
Active Queue Management (AQM) plays an important role in the Internet congestion control. It tries to enhance congestion control, and to achieve tradeoff between bottleneck utilization and delay. Random Early Detection (RED) is the most popular active queue management algorithm that has been implemented in the in Internet routers and is trying to supply low delay and low packet loss. RED al...
متن کاملVRED: An improvement over RED algorithm by using queue length growth velocity
Active Queue Management (AQM) plays an important role in the Internet congestion control. It tries to enhance congestion control, and to achieve tradeoff between bottleneck utilization and delay. Random Early Detection (RED) is the most popular active queue management algorithm that has been implemented in the in Internet routers and is trying to supply low delay and low packet loss. RED al...
متن کاملDiverse Exploration for Fast and Safe Policy Improvement
We study an important yet under-addressed problem of quickly and safely improving policies in online reinforcement learning domains. As its solution, we propose a novel exploration strategy diverse exploration (DE), which learns and deploys a diverse set of safe policies to explore the environment. We provide DE theory explaining why diversity in behavior policies enables effective exploration ...
متن کاملImproving Fast Charging Methods Using Genetic Algorithm and Coordination between Chargers in Fast Charging Station of Electric Vehicles in Order to Optimal Utilization of Power Capacity of Station
Fast charging stations are one of the most important section in smart grids with high penetration of electric vehicles. One of the important issues in fast chargers is choosing the proper method for charging. In this paper, by defining an optimization problem with the objective of reducing the charging time, the optimal charging levels are obtained using a multi-stage current method using a gen...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016